{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Notebook 4.2: Baichuan-13B"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.2.1 概述\n",
    "\n",
    "本 Notebook 展示了如何使用[BigDL-LLM](https://github.com/intel-analytics/BigDL/tree/main/python/llm) API 在低成本 PC 上（无需独立显卡）运行[Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B) 中文推理。Baichuan-13B 是百川智能科技继 [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B) 之后开发的一款开源、商业化的大规模语言模型。Baichuan-13B 还可在以下[链接](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat)中的[Huggingface models](https://huggingface.co/models)中找到。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.2.2 安装\n",
    "\n",
    "首先，在准备好的环境中安装 BigDL-LLM。有关环境配置的最佳实践，请参阅本教程的 [第二章](../ch_2_Environment_Setup/README.md)。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install --pre --upgrade bigdl-llm[all]\n",
    "\n",
    "# Baichuan-13B-Chat 进行生成所需的额外软件包\n",
    "!pip install -U transformers_stream_generator"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "all 选项用于安装 BigDL-LLM 所需的其他软件包。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.2.3 加载模型与 Tokenizer\n",
    "\n",
    "### 4.2.3.1 加载模型\n",
    "\n",
    "使用 BigDL-LLM API加载低精度优化（INT4）的 Baichuan 模型以降低资源成本，这会将模型中的相关层转换为 INT4 格式，\n",
    "\n",
    "\n",
    "> **注意**\n",
    ">\n",
    "> 您可以指定参数 `model_path` 为 Huggingface repo id 或本地模型路径。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bigdl.llm.transformers import AutoModelForCausalLM\n",
    "\n",
    "model_path = \"baichuan-inc/Baichuan-13B-Chat\"\n",
    "model = AutoModelForCausalLM.from_pretrained(model_path,\n",
    "                                             load_in_4bit=True,\n",
    "                                             trust_remote_code=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2.3.2 加载 Tokenizer\n",
    "\n",
    "LLM 推理也需要一个 tokenizer。它用于将输入文本编码为张量，然后输入 LLM，并将 LLM 输出的张量解码为文本。您可以使用 [Huggingface transformers](https://huggingface.co/docs/transformers/index) API 直接加载 tokenizer。它可以与 BigDL-LLM 加载的模型无缝配合使用。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoTokenizer\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_path,\n",
    "                                          trust_remote_code=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.2.4 推理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2.4.1 创建 Prompt 模板\n",
    "\n",
    "在生成之前，您需要创建一个 prompt 模板。这里我们给出了一个用于提问和回答的 prompt 模板示例。您也可以根据自己的模型调整 prompt。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "BAICHUAN_PROMPT_FORMAT = \"<human>{prompt} <bot>\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2.4.2 生成\n",
    "\n",
    "接下来，您可以使用加载的模型与 tokenizer 生成输出。\n",
    "\n",
    "> **注意**\n",
    ">\n",
    "> `generate` 函数中的 `max_new_tokens` 参数定义了预测的最大 token 数量。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-------------------- Output --------------------\n",
      "<human>AI是什么？ <bot>\n",
      "AI是人工智能（Artificial Intelligence）的缩写，它是指让计算机或其他设备模拟人类智能的技术。AI可以执行各种任务，如语音识别\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "import torch\n",
    "\n",
    "prompt = \"AI是什么？\"\n",
    "n_predict = 32\n",
    "with torch.inference_mode():\n",
    "        prompt = BAICHUAN_PROMPT_FORMAT.format(prompt=prompt)\n",
    "        input_ids = tokenizer.encode(prompt, return_tensors=\"pt\")\n",
    "        # 如果您选择的模型能够利用之前的 key/value attentions 来提高解码速度，\n",
    "        # 但其模型配置中的 `\"use_cache\": false`，\n",
    "        # 则必须在 `generate` 函数中明确设置 `use_cache=True`，\n",
    "        # 以便利用 BigDL-LLM INT4 优化获得最佳性能。\n",
    "        output = model.generate(input_ids,\n",
    "                                max_new_tokens=n_predict)\n",
    "        output_str = tokenizer.decode(output[0], skip_special_tokens=True)\n",
    "        print('-'*20, 'Output', '-'*20)\n",
    "        print(output_str)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llm-zcg",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
